Job Scheduling and Data Replication in Hierarchical Data Grid

نویسندگان

  • Somayeh Abdi
  • Sayyed Mohsen Hashemi
چکیده

Data Grid environment is a geographically distributed that deal with date-intensive application in scientific and enterprise computing. In data-intensive applications data transfer is a primary cause of job execution delay. Data access time depends on bandwidth, especially when hierarchy of bandwidth appears in network. Effective job scheduling can reduce data transfer time by considering hierarchy of bandwidth and also dispatching a job to where the needed data are present. Additionally, replication of data from primary repositories to other locations can be an important optimization step to reduce the frequency of remote data access. Objective of dynamic replica strategies is reducing file access time which leads to reducing job runtime. In this paper we develop a job scheduling policy, called SS (Scheduling strategy), and a dynamic data replication strategy, called DRS (Distributed Replication Strategy), to improve the data access efficiencies in a cluster grid. We study our approach and evaluate it through simulation. The results show that combination of SS and DRS has improved 17% over other combinations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Replication-Based Scheduling in Cloud Computing Environment

Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...

متن کامل

Improving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy

Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...

متن کامل

An Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity

The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...

متن کامل

A hierarchical approach to improve job scheduling and data replication in data grid

In dynamic environment of data grid effective job scheduling methods consider location of required data in dispatching jobs to resources. Also, job scheduling methods are combined with data replication mechanisms to reduce remote data access as well as save network bandwidth. In this paper, we combine job scheduling method and dynamic data replication to reduce data access delay and job executi...

متن کامل

A New Job Scheduling in Data Grid Environment Based on Data and Computational Resource Availability

Data Grid is an infrastructure that controls huge amount of data files, and provides intensive computational resources across geographically distributed collaboration. The heterogeneity and geographic dispersion of grid resources and applications place some complex problems such as job scheduling. Most existing scheduling algorithms in Grids only focus on one kind of Grid jobs which can be data...

متن کامل

Hierarchical Replication Strategy for Adaptive Scoring Job Scheduling in Grid Computing

Grid technology, which together a number of personal computer clusters with high speed networks, can reach the same computing power as a supercomputer does, also with a minimum cost. However, heterogeneous system is called as grid. Scheduling independent tasks on grid is more difficult. In order to utilize the power of grid completely, we demand an efficient job scheduling algorithm to execute ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012